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1  ABSTRACT 

A 

In  observational  studies/  the  distribution  of  treatment  assignments  is 
unknown,  and  therefore  randomisation  tests  are  not  generally  applicable* 
However,  permutation  tests  that  condition  on  sample  information  about  the 
treatment  assignment  mechanism  can  be  applicable  in  observational  studies, 
providing  treatment  assignment  is  strongly  ignorable.  These  tests  use  the 
conditional  distribution  of  the  treatment  assignments  given  a  sufficient 
statistic  for  the  unknown  parameter  of  the  propensity  score.  Several  tests 
that  are  commonly  used  in  observational  studies  are  particular  instances  of 
this  general  procedure;  moreover,  conditional  permutation  tests  and  covariance 
adjustment  are  closely  related.  A  backtrack  algorithm  is  developed  to  permit 
.efficient  calculation  of  the  exact  conditional  significance  level,  and  two 
approximations  are  discussed.  A  clinical  study  of  treatments  for  lung  cancer 
is  used  to  illustrate  the  technique.  Conditional  permutation  tests  extend 
previous  large  sample  results  on  the  propensity  score  by  providing  a  general 
basis  for  exact  inference  in  small  observational  studies  when  treatment 
assignment  is  strongly  i^iorable. 
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SIGNIFICANCE  AND  EXPLANATION 


An  observational  study  Is  an  attempt  to  draw  infarances  about  tha  af facts 
of  traataants  from  nonaxpar Inanta 1  data.  In  an  experiment,  treatments  ara 
assignad  by  tha  investigator,  so  it  is  possibla  to  assure  that  tha  units  which 
receive  aach  treatmant  ara  comparable .  In  observational  studies  tha  units 
receiving  tha  two  traataants  may  differ  markedly,  since  tha  treatment 
assignments  ware  not  under  tha  control  of  tha  Invastigator.  The  currant  paper 
develops  two  extensions  of  a  standard  method  in  experiments  —  Fisher's 
randomisation  test  —  that  ara  applicable  in  observational  studies  under 
explicit  assumptions.  An  algorithm  is  developed  for  computing  tha  required 
conditional  permutation  distribution. 


Tha  responsibility  for  tha  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  NEC,  and  not  with  the  author  of  this  report. 


CONDITIONAL  PERMUTATION  TESTS  AND  THE 
PROPENSITY  SCORE  IN  OBSERVATIONAL  STUDIES 


Paul  R.  Rosenbaum* 

1.  INTRODUCTION i  Definitional  Fisher's  Randomisation  Test 

1.1.  The  Propensity  Score  In  Observational  Studies 

The  propensity  score  is  the  conditional  probability  of  exposure  to  a  particular 
treatment  given  a  vector  of  observed  covariates.  Properties  of  the  propensity  score,  its 
role  in  observational  studies,  and  its  relationship  to  various  methods  of  bias  reduction 
are  described  by  Rosenbaum  and  Rubin  (1983a).  The  methods  they  propose  are  applicable  in 
large  observational  studies  in  which  an  estimate  of  the  propensity  score  may  substitute  for 
the  population  propensity  score.  The  current  paper  shows  that,  under  conditions  defined  in 
{2,  the  propensity  score  can  also  provide  a  basis  for  exact  inference  in  small 
observational  studies  if  a  sufficient  statistic  exists  for  the  unknown  parameter  of  the 
propensity  score.  Relevant  notation  and  definitions  from  Rosenbaum  and  Rubin  (1983a)  are 
briefly  reviewed  in  {1.2  and  §1.3,  and  related  to  Fisher's  randomisation  test  in  {1.4. 

1.2.  the  Structure  of  Studies  tor  Tre**—"* 

In  the  case  of  two  treatments  numbered  1  and  0,  the  1th  of  the  N  units  under 
study  has,  in  principle,  both  a  response  r^  that  would  have  resulted  if  it  had  received 
treatment  1,  and  a  response  rQ1  that  would  have  resulted  if  it  had  received  treatment 
0.  Treatment  effects  are  defined  to  be  comparisons  of  r^  and  rQi,  such  as 
r11  “  r0i*  **ch  unit  receives  only  one  treatment,  so  either  r^  or  rQ1  is  observed, 
but  not  both.  Therefore,  inferences  about  the  effects  of  treatments  on  single  units,  as 
distinct  from  collections  or  populations  of  units,  are  largely  speculative!  inferences 
about  treatment  effects  are  inherently  statistical  inferences.  This  structure  is 
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eoniiitwt  with  that  traditionally  uaad  in  tha  litaratura  of  axparimantal  design,  for 
example,  in  tha  books  by  Fisher  (1935),  Kempt  home  (1952),  and  Cox  (1958),  and  follows  tha 
develnpment  for  observationel  studies  in  Rubin  (1974,  1977,  1978),  Hamilton  (1979), 


Rosenbaum  and  Rubin  ( 1983a, b)  and  Rosenbaum  (1982).  For  further  discussion  of  this 
structure  and  some  of  its  limitations,  see  Cox  (1958,  chapter  2),  Rubin  (1978,  $2.3;  1980) 
and  Rosenbaum  and  Rubin  (1983a,  §1.1). 

For  the  l11*  unit  of  M  units  in  the  study  (i~1,...,N),  let  s^  be  the  indicator 
for  treatment  assignment,  with  ■  1  if  unit  i  is  assigned  treatment  1,  and  “  0 
if  unit  i  is  assigned  to  the  treatment  0.  Let  x^  be  a  vector  of  observed  pretreatment 
measurements  or  covariates  for  the  ith  unit;  all  of  the  eeasur aments  in  x  were  made 
prior  to  treatment  assignment,  but  x  may  not  include  all  covariates  used  to  make 
treatment  assignments.  The  propensity  score,  e(x),  is  the  conditional  probability  of 
assignment  to  treatment  1  given  tha  observed  covariates,  that  is. 


e(x)  -  pr(r  -  1  |  x) 


where  it  is  assumed  that 


N  s.  1_*1 

pr(st,...,sH  |  x^...,^)  -  I  etj^)  (1  -et^)}  •  0*0 

Although  this  strict  independence  assumption  is  not  essential,  it  simplifies  notation  and 
discussion. 

1.3  A  Critical  Assumption!  atronqiv  iunorable  Treatment  Assignment 

Randomised  and  nonrandomized  trials  differ  in  two  ways.  First,  in  a  randomised  trial, 
the  propensity  score  is  a  known  function,  whereas,  in  an  observational  atudy  the  propensity 
score  function  is  almost  always  unknown.  Second,  with  properly  collected  data  in  a 
randomised  trial,  x  is  known  to  contain  all  covariates  that  are  both  used  to  assign 
treatments  and  possibly  related  to  the  response  (r, ,  rQ ) .  More  formally.  In  a  randomised 
trial,  treatment  assignment  s  and  response  (r^ ,  rQ),  are  known  to  be  conditionally 
Independent  given  x,  or  In  David's  (1979)  notation, 

V  1L  •  1  5  ’  (1,2> 

This  condition  is  usually  not  known  to  hold  in  a  nonr an demised  experiment.  Moreover,  in  a 
randomised  experiment,  every  unit  in  the  experiment  has  a  chance  of  receiving  each 
treatment.  Following  Rosenbaum  and  Rubin  ( 1983a, b),  treatment  assignment  will  be  said  to 


-2- 


be  strongly  ignorable  if  (1.2)  holds  and  0  <  pr(s*1|x)  <  1  for  all  x.  (As  ths  tarn 
suggests,  strong  ignorability  Is  a  somewhat  more  rastrlctlva  condition  than  ignorability  as 
dafinad  by  Subin  (1978).) 

Tha  assumption  of  atrongly  ignorabla  treatment  asaignaant  plays  a  critical  rola  in 
infaranca  from  observational  stadias  (a.g. ,  Rosanbavai  and  Rubin  1983a.  thaoraa  4.  and  (2 
below) ,  and  tharafora.  tha  corractnass  and  Implications  of  this  assuaption  will  ganarally 
raquira  investigation  in  aach  obsarrational  study.  In  tha  case  of  binary  responses  (r. , 
rQ).  Rosanbaua  and  Sabin  (1963b)  describe  a  method  for  assessing  the  sensitivity  of 
conclusions  to  certain  departures  frea  strong  ignorability.  Rosanbaua  (1982)  reviews 
asthods  of  tasting  tha  assuaption  of  strong  Ignorability.  Related  discussion  within  a 
Bayesian  fraaawork  is  given  by  Rubin  (1978.  §4). 


Fisher's  (1935)  randomisation  test  examines  tha  sharp  null  hypothesis  of  sero 
difference  in  tha  affects  of  tha  treataants  for  aach  experimental  unit,  that  is. 


V  rii  -  r0i  e°r  i"1'2' 


Bote  that  tha  sharp  null  hypothesis  (1.3)  states  that  tha  same  response  would  have  been 
observed  from  aach  unit  had  it  received  the  alternative  treataent. 

bet  r  .  be  the  observed  response  for  unit  i  «  that  is,  r  ,  -  *iru  *  (1-s.)  r01. 


and  denote  the  vector  of  observed  responses  by  r  ■  (r  .,r 

%  II  ! 


81'  *2' 


,r  _J)  ,  and  the  vector  of 

BV 


triitMnt  Milgnaanti  by  {  ■  (*i's2'***'sh>  *  ^  t(|,£)  bt  •  statistic  chosen  to 

Measure  departures  fro*  the  sharp  null  hypothesis  (1*3)9  for  example,  t(*,r)  sight  be  the 
difference  in  Maple  naan  responses  to  the  two  treatments,  that  is 

*<*»£>  "  t<£T£>/lTi>  -  0.4) 

where  ^  is  an  M  dimensional  vector  of  1's.  Alternatively,  t(|>,r)  could  be  the 
difference  between  two  robust  measures  of  the  typical  response  in  the  two  treatment 
groups.  Fisher  proposed  testing  the  sharp  null  hypothesis  (1.3)  using  the  tails  of  the 
permutation  distribution  of  t({,jr)  Induced  by  the  randomization,  where  £  is,  in  a 
sense,  treated  as  a  constant.  Fixing  r  at  its  observed  value  in  this  way  is  equivalent 
to  conditioning  on  r  in  a  randomised  experiment,  but  not  generally  in  an  observational 


study.  As  s  rssult,  randomisation  tests  applied  in  randomised  experiments  have  the  correct 
slss  given  the  observed  r,  and  therefore  the  correct  unconditional  sise  regardless  of  the 
distribution  of  r  (e.g.  Lehmann  1959,  chapter  5i  Rubin  1980).  Zn  observational  studies, 
a  different  type  of  conditioning  nay  be  required. 

Zn  order  to  motivate  the  general  discussion  in  $2,  it  is  useful  to  briefly  review, 
with  the  current  notation,  the  justification  for  conditioning  on  the  observed  response, 
r,  in  Fisher's  randomisation  test.  Zn  a  completely  randomised  experiment,  the  treatment 
assignment,  s,  has  a  known  distribution  that  is  independent  of  the  response  (r^ ,  rQ) , 
and,  moreover,  0  <  pr(r»1)  <  1  so  treatment  assignment  is  strongly  ignorable  without  any 
covariates,  that  is  with  x  equal  to  a  null  vector.  The  distribution  of  the  observed 
response,  rg ,  generally  depends  on  si  however,  under  the  sharp  null  hypothesis  (1.3), 
the  observed  response  satisfies  rE  -  r1  ■  rQ,  so  treatment  assignment,  s,  is  independent 
of  the  observed  response,  ra.  Therefore,  under  (1.3),  the  conditional  distribution  of  the 
treatment  assignments  given  the  observed  responses ,  pr({jr),  is  equal  to  the  marginal, 
randomisation  distribution  of  the  treatmsnt  assignments,  pr(£).  Hence,  under  the  sharp 
null  hypothesis,  (1.3),  the  conditional  distribution  of  the  test  statistic,  t(s,r),  given 
the  value  of  the  observed  responses,  r  -  c  say,  equals  the  permutation  distribution  of 
t(£,£)  induced  fay  the  randomisation.  Formally,  for  each  constant  £, 

pr{t(s,£)|r  -  £>  -  pr{t(£,£)|£  -£>  -  pr{t(£,£>)  (1.5) 

from  (1.3)  and  (1.2)  with  x  equal  to  a  null  vector.  The  conclusion  that  Fisher's  test 
has  the  correct  conditional  sise  given  the  observed  r,  and  therefore  also  the  correct 
unconditional  sise,  is  an  immediate  consequence  of  (1.5). 

Notice  that  the  justification  for  Fisher's  test  rests  on  two  conditions.  First,  the 
randomisation  or  permutation  distribution  of  s,  and  therefore  also  of  t($_,£)  for  esch 
constant  c  ,  is  known,  since  it  is  oreated  by  ths  experimenter.  Second,  treatment 
assignment  is  strongly  ignorable  without  covariatas,  so  under  the  sharp  null  hypothesis, 
the  known  permutation  distribution  of  t(s,c)  equals  the  relevant  conditional 
distribution  of  t(s,r)  given  the  observed  responses,  r  -  c»  that  is,  condition  (1.5) 
holds.  Zn  observational  studies,  even  if  treatment  assignment  is  strongly  ignorable,  the 


distribution  of  traataant  assignments  la  generally  unknown,  and  therefore  Fisher's 
randomisation  test  is  not  generally  applicable. 

2.  Oondltlonal  Permutation  Tests  Under  a  Logistic  Model  for  the  Propensity  Score 
2.1  A  Basic  Theorem 

this  section  shows  that  Fisher's  randomisation  test  nay  be  extended  so  that  it  is 
applicable  in  observational  studies  providing  (a)  treatment  assignment  is  strongly 
Ignorable,  and  (b)  the  propensity  score  follows  a  logistic  model  (Cox  1970),  that  is, 

•(j)  q, 

109  -  ri<&»  > 

where  £  is  an  unknown  vector  parameter,  And  f  ( •)  is  a  known,  vector-valued  function  of 

T 

x,  such  as  £(x)  -  (1,x)  •  Since  £(g)  may  include  polynomial  terns  in  x,  condition 

(2.1)  is  not  particularly  restrictive.  Let  £  and  x  be  the  matrices  whose  N  rows  are, 

T  T 

respectively,  the  values  of  f(x. )  and  x.  ,  i  -  1,2,...,N.  By  a  familiar  argument  (Cox 

1970,  {4.2),  jTj  is  sufficient  for  £  in  (2.1). 

The  proposed  test  is  similar  to  a  randomisation  test,  but  with  a  nuisance  parameter, 

£,  describing  the  treatment  assignment  mechanism.  To  eliminate  the  nuisance  paremeter,  we 

use  the  conditional  distribution  of  the  treatment  assignments  c  given  the  sufficient 
T 

statistic,  jt  J,  for  £.  Unlike  a  randomisation  teat,  this  conditional  test  compares  the 
observed  test  statistic,  t(g,£),  to  the  value,  t(b,jr) ,  that  would  have  been  obtained 
under  the  null  hypothesis  with  a  different  treatment  assignment,  indicated  by  the  binary 
vector  b,  only  if  £  is  similar  to  the  observed  treatment  assignment  in  the  sense  that 
b*£  -  jT£.  Clearly,  alt*: native  treatment  assignments  satisfying  bTF  -  sTF  will 
typically  exist  only  when  the  values  in  j  are  fairly  coarse;  see  for  example  }3. 

Theorem  1 .  Suppose  the  propensity  seore  follows  the  logistic  model  (2.1). 

(A)  Then  the  oondltlonal  distribution  of  treatment  assignments  £  given  (£TF,X)  is  free 
of  unknown  paraswters  and  assigns  the  sane  probability  to  each  binary  vector  b  satisfying 

£TE  -  il- 

(B)  Under  the  sharp  null  hypothesis  (1.3),  if  treatment  asslgnswnt  is  strongly  ignorable, 
then  the  conditional  distribution  of  the  test  statistic,  t(£,r)  given  £  -  £  and 


m  m 

(j  £,X)  equals  tha  known  conditional  P« nutation  distribution  of  t(£,£)  qiven  (z_  F,X) 
that  is  determined  frost  part  (A);  i.a., 

pHt<£#£)||rc/  £ l,  x)  -  pr{t(s,c)|sTP.  x) 

for  each  constant  c. 

(C)  If  treatment  assignment  is  strongly  ignorable ,  and  if,  for  each  fixed  c  and  a,  the 

set  W(c,£)  satisfies  pr{t(j,c)  «  W(c,£) “  a,X)  -  a  (respectively  <  a)  under  the 

T 

sharp  null  hypothesis  (1.3),  then  a  test  which  rejects  whenever  t ( r )  e  t*(r,s_  j)  has 

level  a  (respectively  <  a)  for  all  values  of  the  unknown  parameter  £. 

T 

Proof i  part  A  is  straightforward  since  i  t  is  sufficient  for  £  in  (2.1).  To  prove 
part  B,  note  that  strong  ignorability  and  the  sharp  null  hypothesis  (1.3)  Imply 

*  II  r  I  X 

«w  a,  «w 

and  hence,  essentially  following  Lamms  4.2 ( il)  of  Dawid  (1979), 

£  11  t  *  S'  it 

since  P  is  a  function  of  X.  Therefore,  strong  ignorability  and  the  null  hypothesis 
imply 

pr(t(*.£)  l  £  -  C,  il,  j)  -  pr{t(fc,£>  1  r  -  c,  *Ty,  x> 

-  pr{t(£,£)  |  *Tr,  x}  (2.2) 

as  required  for  part  B.  Part  c  follows  immediately  from  (2.2).  // 

If  the  treatment  effect  is  constant  in  the  sense  that  r^  •  r^  +  A,  or  that 
■  i  rQ,  for  same  scalar  A,  then  a  confidence  interval  for  A  may  be  constructed  by 
inverting  the  test  (Lehmann  19S9,  {5.4).  The  test  described  by  Theorem  1  will,  however, 
have  the  nominal  level  whether  or  not  tha  treatment  effect  is  constant. 

2.2  Aw  Artificial  Example 

The  following  artificial  example  is  Intended  to  clarify  the  procedure  described  in 
{2.1  and  to  simplify  discussion  of  the  backtrack  algorithm  in  {2.3;  a  practical  example 
appears  in  {3.  The  data  in  Table  1  were  generated  by  setting 


and 


r0i  *  5  X1 
rli  “  5  *i  + 


1 
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« 


where  the  treatment  affect  -  r  equals  1  for  each  unit,  and  the  observable  response 

is 


‘si 


5  "l  +  *i  ' 


Two  units  received  treatment  1,  and  eight  units  received  treatment  0. 

The  test  statistic  used  here  is  the  sample  total  in  the  treatment  1  group,  that  is, 
T 

t(z,r)  «  z  r .  It  is  straightforward  to  show  that  the  critical  region  induced  by  this 
statistic  is  the  same  as  the  critical  region  Induced  by  the  difference  in  sample  means 
(1,4).  See  Kempthorne  (19S2)  for  details. 

Together,  the  two  parts  of  Table  2  list  the  elements  of  the  sample  space  associated 


with  Fisher's  randomisation  test.  Thera  are  (  “)  "  45  elements  in  the  sample  space 

corresponding  to  the  (^®)  ways  of  selecting  the  two  units  that  will  receive  the  treatment 

1.  Eleven  of  the  45  treatment  reasaignments  produce  response  totals  greater  than  or  equal 

T 

to  the  observed  response  total  of  t  r  *  7,  so  Fisher's  one  sided  significance  level  is 


11/45  -  .24. 


In  an  observational  study,  a  logistic  model,  (2.1),  for  the  propensity  score  with 
T 

£(x1)  •  (1,  x1>  would  lead  us  to  restrict  attention  to  treatment  assignments,  b,  that 

T  T 

are  similar  to  the  observed  treatment  assignment  in  the  sense  that  b  F  -  z  F  »  (2,1), 

that  is,  treatment  assignments  in  which  the  treatment  group  includes  one  unit  with  Xj^  - 

1  and  one  unit  with  x^  «  0.  This  conditional  sample  apace  contains  the  25  elements  in 

T 

the  top  half  of  Table  2.  The  observed  treatment  total,  z  r  ■  7,  is  the  largest  of  the  25 

treatment  totals  from  the  conditional  sample  space,  so  the  conditional  one-sided 

significance  level  is  1/25  ■  .04.  In  this  instance,  the  one-sided  .05  level  conditional 
T 

critical  region,  W(£,£  £) ,  contains  only  the  observed  treatment  total.  By  Theorem  1, 
this  test  would  have  the  nominal  level  if  treatment  assignment  is  strongly  ignorable  and 
model  (2.1)  holds. 


In  a  completely  randomized  experiment,  both  the  unconditional  and  the  conditional 
teats  have  the  nominal  level,  although  the  conditional  test  performs  a  kind  of  covariance 
adjustmenti  see  {2.5.  However,  in  an  observational  study,  only  the  conditional  test  can  be 
used  because  the  distribution  of  treatment  assignments  generally  depends  on  unknown 


parameters 


TABLE  2. 

The  Conditional 

and  unconditional  Permutational  Sample  Space. 

Elements 

of  the  Conditional  Sample  Space 

T  T 

z  r  z  r 

At  M  M 

T 

z  r 

Mr  Ar 

AB* 

7 

CG 

5 

EH  5 

AG 

6 

CH 

5 

EX  5 

AH 

6 

Cl 

5 

EJ  5 

AX 

6 

CJ 

5 

FG  5 

AJ 

6 

DG 

5 

FH  5 

BC 

1 

DH 

5 

FI  5 

BD 

1 

DX 

5 

FJ  5 

BE 

1 

DJ 

5 

BF 

1 

EG 

5 

Additional 

Elements  of  the  Unconditional  Sample  Space 

T 

T 

£  l 

£  £ 

AC 

11 

DF 

10 

AD 

11 

EF 

10 

AE 

11 

GH 

0 

AF 

11 

GI 

0 

BG 

1 

GJ 

0 

BH 

1 

HI 

0 

BX 

1 

HJ 

0 

BJ 

1 

XJ 

0 

CD 

10 

CE 

10 

CP 

10 

DE 

10 

*  Letter  pairs  indicate  the  units  receiving  treatment  1. 
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2.3  A  Backtrack  Algorithm 

In  general,  calculation  of  the  conditional  significance  level  requires  identification 

T  T 

of  all  binary  vectors  b  such  that  b  f  ■  z  P .  For  small  N,  this  task  is  not  as 
difficult  as  one  might  suppose,  providing  we  avoid  checking  most  of  the  2N  possible 
binary  vectors.  This  section  describes  an  efficient  but  easily  implemented  backtrack 
algorithm.  For  a  general  discussion  of  backtrack  algorithms,  see  Whitehead  (1973,  $2.3)  or 
Horowitz  and  Sahni  (1978,  chapter  7). 

Each  binary  vector,  b,  is  a  path  through  N  +  1  nodes  of  a  binary  tree;  see  Figure 

1.  we  begin  at  the  root  of  the  tree,  exploring  each  branch  until  it  becomes  apparent  that 

T  T 

no  b  in  that  branch  will  satisfy  b  £  -  £  £.  If  we  abandon  a  branch  at  a  node  at 
level  k,  then  we  have  eliminated  2N-,C+1  of  the  2N  possible  binary  vectors. 

Without  loss  of  generality,  we  may  assume  that  F  is  strictly  nonnegative,  that  is, 
f^  >  0  for  i  -  1 , 2, . . . ,N  and  m  -  1,2,...,M,  where  fim  is  the  element  in  the  ith  row 
and  mth  column  of  F.  Suppose  we  are  at  a  node  at  level  k  +  1  defined  by 
(b.|,  b2,..»,  by)  where  k  <  H.  A  simple  rule  is  to  abandon  the  branch  beginning  at  this 
noda  if  for  soma  m,  1  <  m  <  M,  either 


k  M  N 

I.  bifi» +  4  J.  zifi* 


(2.3) 


i-1 


i-k+1 


1-1 


k  N 

K  bi  fim  >  I  *  i  f: 


i-1 


i-1 


im 


(2.4) 


If  condition  (2.3)  holds  at  a  node  at  level  k  +  1,  then  7  b  f  is  already  too 

i-1  1  im 


ill,  in  the  sense  that  every  binary  vector  b  whose  first  k  coordinates  correspond  to 


the  given  node  of  the  tree  will  satisfy 

N  N 

I  bi  fi«  <  l  *i  fi«  * 


i-1 


i-1 


Similarly,  if  (2.4)  holds,  then  £  b  f  is  already  too  large.  The  procedure  is 

i-1  1  iB 

illustrated  in  Figure  1  using  the  artificial  data  from  $2.2. 


If  several  units  have  identical  values  of  the  vector  f (x) ,  then  a  more  efficient 


algorithm  aay  be  constructed.  Suppose  f (x^)  »  f (x^)  for  a  one  u  <  v.  If  b  - 

T  T 

(b.,b,,...,b  ,...,b  ,...,bJ  solves  b  r  ■  t  r,  then  so  does  b*  * 

<b1'b2 . bv . V  —  ,b||) ,  where  b*  is  obtained  f row  b  by  interchanging  bu  and 

I  T 

b^.  Therefore,  the  solutions  of  J>  £  *  1  5.  UV  be  partitioned  into  equivalence  classes, 
where  b  end  b*  are  in  the  same  equivalence  class  if  b*  nay  be  obtained  from  b  by 
pereuting  coordinates  of  b  associated  with  identical  values  of  fix).  To  obtain  all 
solutions  of  bTr  -  *Tr,  it  is  sufficient  to  obtain  one  b  from  each  equivalence  class  of 
solutions  using  a  backtrack  algoritha,  and  then  to  obtain  the  other  members  of  the  same 
equivalence  class  by  appropriate  pe mutations  of  the  coordinates  of  b.  This  procedure  is 
a  version  of  isoaorph  rejection!  see  Whitehead  (1973,  {2.4).  To  obtain  one  b  frost  each 
equivalence  class,  we  way  use  a  backtrack  algor ithsi  that  abandons  a  branch  at  a  node  at 
level  k  if  (2.3)  or  (2.4)  holds,  or  if 

li.s  >  b  for  scan  u  <  k  such  that  f (x^)  -  Jtx^)  .  (2.5) 

In  a  backtrack  algoritha,  additional  conditions  such  as  (2.5)  generally  reduce  the 
nuwber  of  branches  that  require  investigation,  thereby  generally  increasing  efficiency.  In 
special  cases,  wore  efficient  aethods  are  available!  see  {2.4.  Approximations  to  the  null 
distribution  of  the  test  statistic  are  given  in  {4,  and  other  related  large  sample 
procedures  are  described  by  Rosenbaum  and  Rubin  (1983a,  {3). 

2.4  Standard  Tests  for  observational  Studies  Derived  as  Conditional  Tests  Given  a 
Sufficient  Statistic  for  the  Propensity  Score 
This  section  shows  that  several  commonly  used  tests  can  be  viewed  as  conditional 
permutation  tests  given  a  sufficient  statistic  for  the  propensity  score.  In  these  tests, 
the  response  rtl  is  a  discreta  random  variable  taking  one  of  R  possible  values. 

In  the  Manta 1-Haensrel{ 1959 )  and  Mantel  (1963)  approximate  procedures  and  the 
corresponding  exact  procedures  given  by  Birch(1964,  1965)  and  Cox(1966),  there  are  M 
subclasses,  resulting  in  an  Rx2xM  contingency  table  (i.e.,  observed  response  rz  by 
treatment  z  by  subclass  x).  Define  F  so  that  flm  -  1  if  unit  i  falls  in 
subclass  m,  and  f,_  -  0  otherwise,  so  that  under  the  logit  model  (2.1),  the  unknown 


conditional  probability  of  assignment  to  treatment  1  givan  the  covariataa  x  la  conatant 

within  each  aubclaaa.  If  treatment  aaaignaent  la  atrongly  ignorable,  than  it  followa  froa 

T 

Theoren  1.B  that,  under  the  null  hypotheaia  (1.3),  conditioning  on  (r, j_  P,X)  raatricta 
the  aaapla  apace  to  thoaa  Rx2xM  tablaa  in  which  each  of  the  J  aubtablaa  haa  the  aaaa 
aargina  aa  the  observed  table.  This  laada  to  the  Birch-Cox  exact  distributions  and  the 
Mantel-Haenaxel  and  Mantel  approxiaationa.  If  treataent  aaaignaant  ia  atrongly  ignorabla, 
aubclaaaification  on  the  propenaity  acore  can  produce  aubclaaaea  with  the  propertiea 
required  by  Theoraa  1.  For  diacuaalon  of  aubclaaaification  on  the  propenaity  acore,  aee 
Roaenbaua  and  Rubin  (1983a,  $3.3). 

McMeaar' a(1947)  teat  for  paired  binary  reeponaea  ia  the  special  case  in  which  each 
subclass  haa  just  two  units,  with  one  receiving  each  treatment.  The  pairs  are  typically 
constructed  by  matched  sampling  (e.g. ,  Rubin,  1973).  Model  (2.1)  implies  that  units  have 
been  selected  by  matched  sampling  from  a  population  of  treated  and  control  units  in  such  a 
way  that  the  conditional  probability  of  assignment  to  treatment  1  given  covariates  x  la 
constant  within  each  pair.  If  treatment  assignment  ia  atrongly  ignorabla,  then  matched 
sampling  of  treated  and  control  units  with  the  sane  value  of  the  propenaity  acore  can 
produce  matched  paira  with  the  properties  required  by  Theorem  1.  For  discussion  of 
propensity  matching,  see  Roaenbaun  and  Rubin  (1983a,  $3.2). 

2.5  Conditional  Permutation  Testa  and  Covariance  adjustment 

This  section  examines  the  relationship  between  conditional  permutation  tests  and 
covariance  adjustment.  In  $1.3  and  $2.2,  the  difference  in  sample  mean*  (1.4), 
or  equivalently  the  treetment  1  total,  **£,  was  used  as  a  teat  statistic,  t(£,r).  An 
alternative  test  statistic  is  the  difference  in  means  after  covariance  adjustment  for  F, 
that  is,  the  first  coordinate  of  the  estimated  coefficient  vector  in  the  least  squares 
regression  of  £  on  (£,£)•  The  randomisation  distributions  ($1.4)  of  these  two  test 
statistics  can  lead  to  markedly  different  conclusions.  Ms  now  show  that  the  conditional 
permutation  tests  ($2.1)  based  on  these  two  statistics  land  to  identical  critical  regions, 
and  therefore  to  identical  tests  and  confidence  intervals,  providing  the  model  (2.1) 
includes  a  constant  term  (or,  formally,  providing  the  column  rank  of  (1,F)  equals  the 


column  rank  of  £).  Xn  a  aenae,  tha  conditional  taat  par  forma  a  covariance  adjustmant; 
however,  tha  taat  haa  tha  nominal  laval  even  if  tha  linaar  regreeeion  modal  ia  Incorrect. 


Vo  prove  tha  equivalence  of  conditional  taata  baaed  on  tha  two  taat  atetiatica,  it  ia 

aufficient  to  ahow  that  tha  covariance  adjuatad  difference,  t*(b,r)  aay,  ia  a  atrictly 

T 

Monotone  function  of  b  r,  for  each  treatment  assignment ,  b,  in  the  conditional  sample 

T  T 

apace,  that  ia,  for  b  auch  that  b  [  ■  a  F .  Without  loaa  of  generality,  aaeune  £  ia 

of  full  column  rank.  Familiar  argumenta  (a. 9.  Saber,  1977,  p.  65)  ahow  that  tha  covariance 

adjuatad  difference  with  treatment  aaaignmant  b  ia 

rT(l-p)b 
t*(b,r)  -  — - 

fe  <«)fe 

where  £  •  £(  £T f)  'j*  and  £  ia  tha  *  x  S  identity  matrix.  Over  tha  conditional  aample 

T  T 

apace,  b  £  ia  conatant,  and  therefore  £b  and  bPb  are  conatant.  Moreover,  by 

aaaumption  1  -  Pd  for  acme  d,  ao  bTb  «  £T1_ «  bTFd  ia  conatant.  Therefore, 


k 


1 


for  conatanta  k,  and  k](  ao  t*(b,£)  ia  a  atrictly  monotone  function  of  £Tb,  aa 
required  to  complete  the  proof. 


3.  K  Clinical  Kxamplei  Tumor  Reanonae  In  Lung  Cancer  Patlenta 
3.1 .  The  Conditional  Permutation  Teat 

The  example  in  thia  auction  illuatratea  the  uae  of  the  exact  conditional  teat  with 
adjuatmenta  for  aeveral  cover iatea  in  a  email  obaervational  comparison.  The  data  are 
adapted  from  a  clinical  atudy  of  lung  cancer  in  which  two  alight  varianta  of  the  a awe 
treatment  appeared  to  produce  differing  tiwor  reaponae  ratea.  Given  the  expectation  that 
thia  minor  variation  in  the  treatment  would  not  alter  the  reaponae  rate,  it  ia  natural  to 
aak  to  what  extent  the  obaerved  difference  in  reaponae  ratea  ia  aurpriaing,  given  the 
characterletlca  of  the  patienta  involved.  The  data  appear  in  Table  3. 


fatiant  T 

mor  Raapo 

naa*  TTMtwni 

t  Call  typa 

Pravioua 

Parforaanca 

Subclaaa 

<i> 

<'.!> 

<V 

Traataant 

Status 

(J> 

1 

0 

0 

Sqouooi 

Nona 

0 

1 

2 

0 

0 

Urfa  call 

Nona 

1 

2 

3 

0 

0 

Sqoaaous 

Radiation 

1 

3 

4 

0 

0 

Squanoua 

Radiation 

1 

3 

S 

0 

0 

Sqoaaous 

Radiation 

2 

4 

« 

1 

1 

■qsaaous 

Radiation 

1 

3 

7 

0 

1 

Sqoaaous 

Radiation 

1 

3 

• 

9 

0 

1 

1 

t 

Adanocarclnoaa 

Radiation 

1 

4 

5 

10 

0 

V 

1 

urfi  call 

Nona 

1 

2 

€ 

7 

11 

0 

1 

0*-aaa 

Radiation  a 
Chiotharapy 

2 

8 

12 

0 

1 

•qpMMMO 

Ch  snot har spy 

1 

9 

13 

0 

1 

•qosaoua 

Nona 

0 

1 

14 

2 

1 

■ipiMowa 

Nona 

1 

6 

There  art  three  comlataii  previous  treatment  (none,  radiation  only,  chemotherapy 
only,  radiation  plus  chssn therapy) ,  cell  type  (squamous,  large  cell,  adenocarcinoma),  and 
performance  status  (grades  0,  1,  2).  In  the  logit  model  (2.1),  previous  treatment  was 
coded  as  three  binary  variables,  and  cell  type  as  two  binary  variables.  The  conditional 
permutation  test  considers  all  reassignments  of  treatments  to  patients  such  that  (a)  9 
patients  receive  treatment  one,  and  of  those  9  patients,  (b)  one  has  adenocarcinoma,  (c) 
one  has  large  cell  carcinoaui,  (d)  seven  have  squamous  cell  carcinoma,  (e)  three  have  had 
only  previous  radiation  therapy,  (f)  one  has  had  only  previoua  chemotherapy,  (g)  one  has 
had  previous  radiation  and  chmotherapy ,  and  (h)  the  average  performance  status  is 

.  ,14 . 

10/9  >1.1.  There  are  28  such  treatment  reassignments,  as  compared  to  l5  J  **  2002 
reassignments  in  the  unconditional,  ran dead  cat ion  sample  space. 

Tumor  response  is  defined  in  terms  of  a  reduction  in  the  else  of  the  tuner.  In  Table 
3,  no  tumor  response  has  been  scored  as  0;  a  partial  response  as  1i  a  complete  response 
as  2.  All  of  the  tumor  responses  were  observed  in  patients  receiving  treatment  1, 
yielding  a  total  score  among  patients  receiving  treatment  1  of  t(£,£)  »  |  r  ■  4.  The 
conditional  permutation  distribution  of  this  total  under  the  sharp  null  hypothesis  (1.3) 
assigns  probability  8/28  ■  .29  to  a  total  of  4,  probability  13/28  ■  .46  to  3, 
probability  4/28  “  .14  to  2,  and  3/28  “  .11  to  1.  The  expected  total  score  under 
the  null  hypothesis  is  2.93.  If  the  two  variations  of  the  treatment  were  in  fact 
identical,  and  if  treatment  assignment  is  strongly  ignorable,  there  would  be  little  reason 
to  be  surprised  by  the  observed  total  response  score  in  treatment  group  1,  since  29%  of  all 
treatment  assignments  that  are  similar  to  the  observed  treatment  assignment  would  have 
resulted  in  a  total  of  4. 

3.2.  Comparison  with  Other  Tests >  The  Randomisation  Tssti  A  Test  Based  on 
Subclassification 

We  now  compare  the  results  obtained  in  |3.1  with  the  results  of  two  other  testsi 
Fisher's  randomisation  test,  and  an  exact  test  of  sero  partial  association  between  the 
response  and  the  treatment  within  each  of  the  4  x  3  «  3  ■  36  subclasses  defined  by  the 
covariates.  Fisher's  test  corresponds  to  a  sample  space  containing  (j* )  -  2002  treatment 


sasljnawnta,  with  an  nptetid  total  raaponaa  aoora  under  tha  nail  hypothesis  of  2.57,  and 
a  one-sided  signlflcanca  level  of  (^/(J1)  -  .23.  Of  course,  riahar's  taat  nay  not  ba 
applicable  a Inca  traataanta  were  not  randomly  assigned. 

An  altamativa  procadura,  dascrlbad  In  $2.4,  la  to  fora  36  aubclaaaaa  fran  tha  thraa 
covarlataa ,  and  to  taat  for  partial  aaaoelatlon  within  aubclaaaaa.  Thara  la  a  3  *  2  «  36 
dlnanalonal  contingency  table  for  tha  3  raaponaa  acoraa,  2  traataanta.  and  36  aubclaaaaa) 
aach  of  tha  36  three-by-two  aubtablae  haa  fix ad  aarglna.  In  thin  lnatanoa,  only  6  of  tha 
36  aubclaaaaa  contain  at  laaat  ona  pa tl anti  aaa  tha  leat  ooloan  In  Table  3.  six  pa tl ante 
fall  In  aubclaaaaa  with  no  other  patient ,  and  two  patlanta  who  fall  in  tha  same  aubelaaa 
had  both  received  treatment  1.  In  affect,  none  of  thaaa  S  patlanta  contribute  to  the 
permutation  distribution,  alnce  their  traataanta  cannot  ba  reaaalgnad  aubject  to  tha 
marginal  conatrainta.  It  la  dlaturbing  to  note  that  the  ona  patient  with  a  complete 
raaponaa  and  one  of  tha  two  patlanta  with  a  partial  raaponaa  are  among  the  8  patlanta  who 
do  not  contribute  to  tha  taat.  Tha  conditional  aaapla  apace  oontalaa  (*)  (J)  •  12 
treatment  reaaeignmenta.  with  an  expected  total  raepoarao  aoora  of  3.5  under  tha  null 
hypothaaia.  Tha  ona-aldad  algnificance  level  la  .5. 

The  conditional  teat  deecribed  in  (3.1  haa  tha  advantage  of  permitting  adjuatment  for 
covarlataa  with  fewer  reatrlctiona  on  tha  conditional  aample  space  than  result  from  tha 
aubclaaaifioatlon  procedure.  Both  testa  require  the  aaevptlan  of  etrongly  ignoreble 
treatment  aaaignmant  with  oovariatea  ji  however,  the  taata  assume  different  logistic 
modela  for  tha  propensity  aoora. 

4*  Approximations  to  tha  Hull  Distribution 

An  Approximation  Based  on  Kxaot  Conditional  Momenta 

This  section  develops  an  approximation  to  tha  null  distribution  of  t(g,£}  using  its 
exact  conditional  moments.  Tha  approximation  generalises  tha  procedures  of  Mantel  and 
Haenasel  (1959)  and  Mantel  (1963). 

Aa  noted  in  $2.3,  we  need  not  generate  all  solutions,  £,  of  using  the 

baoktraok  algorithm)  rather,  we  may  generate  ona  solution  from  aach  equivalence  class  of 
solutiona  using  tha  backtrack  algorithm,  and  than  obtain  tha  other  solutions  in  tha  same 
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tqglwlmot  class  by  permuting  unit*  with  identical  values  of  £(x).  Often,  the  number  of 
equivalence  classes  will  be  quite  small,  while  the  number  of  individual  solutions  will  be 
quite  large i  in  the  example  in  (3,  there  are  28  solutions,  but  only  3  equivalence 
classes  of  solutions.  An  approximate  procedure  is  tos  (a)  identify  the  equivalence 
classes  of  solutions  using  the  backtrack  algorithm,  (b)  obtain  by  standard  methods  the 
conditional  expectations  and  variances  of  the  test  statistic,  t(j,r),  within  each 
equivalence  class,  (c)  combine  these  expectations  and  variances  in  an  appropriate  way  to 
obtain  t  ■  l(t(|,{)|£,  |*J,  |)  and  V  •  var{t(£,£)  |r,  |*j,  X)  under  the  null  hypothesis 
and  (d)  test  the  hypothesis  (1.3)  by  referring  a  suitable  standardised  deviate,  such  as 


(,£)  -  l)//v,  to  tables  of  the  normal  distribution. 


Divide  the  N  units  into  J  subclasses  or  strata  based  on  f(x),  where  there  are 

3  distinct  values  of  f(j).  (See  for  example  the  last  column  in  Table  3.)  Let  Wj  be 

the  number  of  units  in  the  jth  subclass,  and  let  r^  and  s*  be  the  mean  and  variance  of 

2 

the  observed  responses  of  all  units  in  subclass  j,  where  s^  is  set  to  aero  if  Nj 

2  — 
equals  one,  and  a^  is  the  sum  of  squared  deviations  around  divided  by  * j  “  1  if 

>  2.  The  kth  equivalence  class  of  solutions  any  be  characterised  by  a  vector 

,a1k'*2k'  ’  *  •  '*Jk*T  *h*r*  ajk  ia  "**&•»’  of  units  in  subclass  1  assigned  to 

treatment  1 i  see  Table  4. 


The  following  theorem  provides  expressions  for  the  null  sxpectatlon  ■  and  variance 
V  when  t(£,£)  -  £*£. 

Theorem »  Suppose  that  treatment  assignment  is  strongly  ignorable,  and  that  (2.1)  holds. 
Then  under  the  null  hypothesis  (1.3),  the  expectation  and  variance  of  the  test  statistic 


*<*'£>  -ll 


’  - 1  v»  *  l 


.  . ;  inaesi . 


Table  4.  Calculations  for  tha  tooroxinata  Taat 
Baaad  on  Kxact  Honenta 


Subclass 

(1) 

1 

2 

3 

4 

5 

6 

7 

8 

9 


Equivalence  Class 


Cell  Type 

Previous 

Trsatasnt 

Perfomanoe 

Status 

of  Solutions 

1  2 

(k> 

_3 

Squanous 

Hone 

0 

1* 

1 

2 

Large  cell 

Hone 

1 

0 

1 

0 

Squanous 

Radiation 

1 

2 

1 

1 

SqUMOOS 

Radiation 

2 

0 

1 

1 

XdtnocTcl  no— 

Radiation 

1 

1 

1 

1 

Squanous 

Mona 

1 

2 

2 

1 

Largs  coll 

Mona 

2 

1 

0 

1 

Squanous 

Radiation  a 
Cheap  therapy 

2 

1 

1 

1 
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.429  .285  .285 
3.503  .251  .75 
.250  .188  .438 


8  -  2.93 
»  -  .851 

(lf*£  -  il  -  .« 

1  -  #(.82)  -  .27 


a.k  -  •  units  in  subolass  j  assigned  to  trsatasnt  1  in  the  kth  equivalence  class  of 
solutions. 
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Rmrku  The  probability  p^  is  the  proportion  of  all  solutions  of  b  F  “  r  F  that  fall 
in  tha  kth  equivalence  class.  Tha  expectation,  B^,  and  variance,  vfcf  corresponding  to 
tha  kth  aquivalanca  class  of  solutions,  ara  tha  axpactation  and  variance  of  tha  treatment  1 
total,  £  r,  in  a  stratifiad  randoadsed  experiment  in  which  a^  of  tha  units  in 

subclass  j  ara  randomly  assigned  to  treatment  1.  in  tha  variance,  (4.4),  tha  factor 
(Rj  -  ajk)/Mj  is  a  finite  population  correction.  If  there  is  only  one  equivalence  class, 
than  B  and  V  ara  the  expectations  and  variances  appearing  in  tha  Mantel-Haenscel  (1959) 
approximation  for  binary  responses  (sea  also  Birch  (1964,  |4)  and  Cox  (1966,  §3)),  and  in 
tha  Mantel  (1963)  approximation  for  scored  responses  (see  also  Birch  (1965,  }5)). 

Proofi  Let  Cy  be  the  kth  aquivalanca  class  of  solutions.  Clearly, 

■  “  I  it’  b  i « <ypr(*  e  ck's>  £*&  V 

k 

-  I  «<ET£l5*  £  «  Vpr(*  *  ck 

k 

T 

for  each  constant  c,  by  (1.2)  and  (1.3),  and  tha  fact  that  s  P  is  constant  for  all 
solutions,  and  in  particular  is  constant  for  all  solutions  in  C^.  By  Theorem  1.1  and 
simple  combinatorial  arguments,  it  follows  that  pr(s  C  cfcl *»*.  P)  “  pfc.  Now,  all  solutions 
in  assign  «^k  units  from  subclass  j  to  treatment  1,  and  moreover,  all  solutions 
in  0^  ara  equally  probable  by  Theorem  1.1,  so  with  £  equal  to  the  observed  response 
r,  the  permutational  expectation  l(|  g|X,  £  e  C^)  equals  B^.  He  have  proved  (4.1). 
Similarly, 

V  -  l  var(£T£|r,  X,  t  e  C.  )pr(*  e  C  |£,  £TP,  X) 

k  xx 

♦  var{B(*Tr|r,  iT[,  X,  s  «  C^Mr,  sTP,  X) 

-  I  Vk  ♦  I  <Bk  "  *,2pk  ' 

as  required.  // 


Tabla  4  illustrates  the  procadura  for  tha  example  in  }3.  The  approximate  significance 


level  is  .27,  eoapared  to  the  exact  significance  level  of  .29. 

4.2.  A  tares  gaggle  approximation 

Ihis  section  describee  a  large  saaple  approximation  to  the  test  defined  in  Theorem  1 
when  the  test  statistic  is  the  total  response  in  treatment  group  1,  that  Is,  when 
t<£,£)  -  sTr.  Let  v  -  log{pr(s-1|g,rs)/{1  -  pr(s-1|x,  r^))],  and  let  v  be  the 
corresponding  vector  for  the  M  units  under  study.  Consider  the  following  logistic  model 
for  £i 

Z  “  Bt  ♦  t®  «•«> 

where  J  and  0  are,  respectively,  unknown  vector  and  scalar  parameters.  Note  that 
(1.2),  (1.3)  and  (2.1)  Imply  0  ■  0  in  (4.6).  Indeed,  the  exact  test  defined  in  Theorem  1 
is,  under  the  null  hypothesis  (1.3),  formally  identical  to  the  exact,  uniformly  most 
powerful  similar  region  test,  described  by  Cox  (1970,  |4.2),  of  the  hypothesis  that 
0  "  0.  To  demonstrate  the  equivalence,  it  is  sufficient  to  note  that  the  test  statistics, 
the  null  distributions,  and  banco  the  critical  regions  are  the  same.  Since  the  tests  are 
equivalent  for  every  finite  saaple,  their  asymptotic  properties  under  the  null  hypothesis 
are  also  identical,  so  a  test  of  (1.3)  may  be  based  on  the  familiar  large  saaple  properties 
of  testa  of  0-0  in  (4.6)i  see  Cox  (1970,  }6.4)  for  discussion  of  these  tests. 
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